Search CORE

5 research outputs found

Language beyond the language system:Dorsal visuospatial pathways support processing of demonstratives and spatial language during naturalistic fast fMRI

Author: Coventry Kenny R.
Lund Torben E.
Rocca Roberta
Staib Marlene
Tylén Kristian
Wallentin Mikkel
Publication venue: 'Elsevier BV'
Publication date: 01/08/2020
Field of study

Spatial demonstratives are powerful linguistic tools used to establish joint attention. Identifying the meaning of semantically underspecified expressions like “this one” hinges on the integration of linguistic and visual cues, attentional orienting and pragmatic inference. This synergy between language and extralinguistic cognition is pivotal to language comprehension in general, but especially prominent in demonstratives. In this study, we aimed to elucidate which neural architectures enable this intertwining between language and extralinguistic cognition using a naturalistic fMRI paradigm. In our experiment, 28 participants listened to a specially crafted dialogical narrative with a controlled number of spatial demonstratives. A fast multiband-EPI acquisition sequence (TR = 388 m s) combined with finite impulse response (FIR) modelling of the hemodynamic response was used to capture signal changes at word-level resolution. We found that spatial demonstratives bilaterally engage a network of parietal areas, including the supramarginal gyrus, the angular gyrus, and precuneus, implicated in information integration and visuospatial processing. Moreover, demonstratives recruit frontal regions, including the right FEF, implicated in attentional orienting and reference frames shifts. Finally, using multivariate similarity analyses, we provide evidence for a general involvement of the dorsal (“where”) stream in the processing of spatial expressions, as opposed to ventral pathways encoding object semantics. Overall, our results suggest that language processing relies on a distributed architecture, recruiting neural resources for perception, attention, and extra-linguistic aspects of cognition in a dynamic and context-dependent fashion

University of East Anglia digital repository

Phonological Features for 0-shot Multilingual Speech Synthesis

Author: Foglianti Lorenzo
Gao Jiameng
Lenain Raphael
Mohan Devang S Ram
Staib Marlene
Teh Tian Huey
Torresquintero Alexandra
Publication venue
Publication date: 06/08/2020
Field of study

Code-switching---the intra-utterance use of multiple languages---is prevalent across the world. Within text-to-speech (TTS), multilingual models have been found to enable code-switching. By modifying the linguistic input to sequence-to-sequence TTS, we show that code-switching is possible for languages unseen during training, even within monolingual models. We use a small set of phonological features derived from the International Phonetic Alphabet (IPA), such as vowel height and frontness, consonant place and manner. This allows the model topology to stay unchanged for different languages, and enables new, previously unseen feature combinations to be interpreted by the model. We show that this allows us to generate intelligible, code-switched speech in a new language at test time, including the approximation of sounds never seen in training.Comment: 5 pages, to be presented at INTERSPEECH 202

arXiv.org e-Print Archive

Crossref

ADEPT:A dataset for evaluating prosody transfer

Author: Foglianti Lorenzo
Gao Jiameng
Hu Vivian
King Simon
Ram Mohan Devang S.
Staib Marlene
Teh Tian Huey
Torresquintero Alexandra
Wallis Christopher G.R.
Publication venue
Publication date: 14/06/2021
Field of study

Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity. One popular method is to transfer the prosody from a reference speech sample. There have been considerable advances in using prosody transfer to generate more expressive speech, but the field lacks a clear definition of what successful prosody transfer means and a method for measuring it. We introduce a dataset of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global variations reflecting emotion and interpersonal attitude, and local variations reflecting topical emphasis, propositional attitude, syntactic phrasing and marked tonicity. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which text-to-speech prosody transfer can be compared. We conclude the paper with a demonstration of our proposed evaluation methodology, using the corpus to evaluate two text-to-speech models that perform prosody transfer.Comment: 5 pages, 1 figure, accepted to Interspeech 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Ctrl-P:Temporal control of prosodic variation for speech synthesis

Author: Foglianti Lorenzo
Gao Jiameng
Hu Vivian J
King Simon
Ram Mohan Devang S.
Staib Marlene
Teh Tian Huey
Torresquintero Alexandra
Wallis Christopher G.R.
Publication venue: 'International Speech Communication Association'
Publication date: 15/06/2021
Field of study

Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text. One way to reduce the amount of unexplained variation in training data is to provide acoustic information as an additional learning signal. When generating speech, modifying this acoustic information enables multiple distinct renditions of a text to be produced. Since much of the unexplained variation is in the prosody, we propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody:

F_{0}

, energy and duration. The model is flexible about how the values of these features are specified: they can be externally provided, or predicted from text, or predicted then subsequently modified. Compared to a model that employs a variational auto-encoder to learn unsupervised latent features, our model provides more interpretable, temporally-precise, and disentangled control. When automatically predicting the acoustic features from text, it generates speech that is more natural than that from a Tacotron 2 model with reference encoder. Subsequent human-in-the-loop modification of the predicted acoustic features can significantly further increase naturalness.Comment: To be published in Interspeech 2021. 5 pages, 4 figure

arXiv.org e-Print Archive

Edinburgh Research Explorer

The emergence of systematicity: how environmental and communicative factors shape a novel communication system

Author: Fusaroli Riccardo
Nölle Jonas
Staib Marlene
Tylén Kristian
Publication venue: 'Elsevier BV'
Publication date: 01/12/2018
Field of study

Where does linguistic structure come from? We suggest that systematicity in language evolves adaptively in response to environmental and contextual affordances associated with the practice of communication itself. In two experiments, we used a silent gesture referential game paradigm to investigate environmental and social factors promoting the propagation of systematicity in a novel communication system. We found that structure in the emerging communication systems evolve contingent on structural properties of the environment. More specifically, interlocutors spontaneously relied on structural features of the referent stimuli they communicated about to motivate systematic aspects of the evolving communication system even when idiosyncratic iconic strategies were equally afforded. Furthermore, we found systematicity to be promoted by the nature of the referent environment. When the referent environment was open and unstable, analytic systematic strategies were more likely to emerge compared to stimulus environments with a closed set of referents. Lastly, we found that displacement of communication promoted systematicity. That is, when interlocutors had to communicate about items not immediately present in the moment of communication, they were more likely to evolve systematic solutions, supposedly due to working memory advantages. Together, our findings provide experimental evidence for the idea that linguistic structure evolves adaptively from contextually situated language use

Enlighten